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and or unpredictable beforehand. In particular, the system is typically unaware of the remaining work 
in each job or of the ability of the job to take advantage of more resources. Following these observations, 
we adopt the job model by Edmonds et al (2000, 2003) in which the jobs go through a sequence of 
different phases. Each phase consists of a certain quantity of work with a different speed-up function 
that models how it takes advantage of the number of processors it receives. In this paper, we consider 
non-clairvoyant online setting where a collection of jobs arrives at time 0. Non-clairvoyant means that 
the algorithm is unaware of the phases each job goes through and is only aware that a job completes 
at the time of its completion. We consider the metrics setflowtime that was introduced by Robert et 
al (2007). The goal is to minimize the sum of the completion time of the sets, where a set is completed 
when all of its jobs are done. If the input consists of a single set of jobs, the setflowtime is simply 
the makespan of the jobs; and if the input consists of a collection of singleton sets, the setflowtime is 
simply the flowtime of the jobs. The setflowtime covers thus a continuous range of objective functions 



Abstract 

Scheduling has been since the very beginning a central issue in computer science. Scheduling 
questions arise naturally in many different areas among which operating system design, compiling, 
memory management, communication network, parallel machines, clusters management,... In real life 
systems, the characteristics of the jobs (such as release time, processing time,...) are usually unknown 
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c*2 ■ from makespan to flowtime. We show that the non-clairvoyant strategy EquioEqui that evenly splits 

the available processors among the still unserved sets and then evenly splits these processors among 
the still uncompleted jobs of each unserved set, achieves a competitive ratio (2 + V3 + o(l)) x ^\™ n 
for the setflowtime minimization and that this competitive ratio is asymptotically optimal (up to a 
constant factor), where n is the size of the largest set. In the special case of a single set, we show that 
the non-clairvoyant strategy Equi achieves a competitive ratio of (1 4- o(l )) ln ""„ for the makespan 
minimization problem, which is again asymptotically optimal (up to a constant factor). This result 
shows in particular that as opposed to what previous studies on malleable jobs may let believe, the 
assertion "Equi never starves a job" is at the same time true and false: false, because we show that it 
can delay some jobs up to a factor ^° "„ , and true, because we show that no algorithm (deterministic 
or randomized) can achieve a better stretch than '" " 
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1 Introduction 



Scheduling has been since the very beginning a central issue in computer science. Scheduling questions 
arise naturally in many different areas among which operating system design, compiling, memory man- 
agement, communication network, parallel machines, clusters management,... Main contributions to the 
field go back as far as to the 1950's (e.g., |17j). It is usually assumed that all the characteristics of the 
jobs are known at time 0. It turns out that in real life systems, the characteristics of the jobs (such as 
release time, processing time,...) are usually unknown and/or unpredictable beforehand. In particular, 
the system is typically unaware of the remaining work in each job or of the ability of the job to take 
advantage of more resources. A first step towards a more realistic model was to design algorithms that 
are unaware of the existence of a given job before its release time [HI [10]. This gave rise to the field of 
online algorithms. The cost of the solution computed by an online algorithm is measured with respect to 
an optimal solution which is aware of the release dates; the maximum value of the ratio of these two costs 
is called the competitive ratio of the algorithm. Later on, [12] introduced the concept of non-clairvoyant 
algorithm in the sense that the algorithm is unaware of the processing time of the jobs at the time they 
are released. They show that for flowtime minimization, the competitive ratio of any non-clairvoyant 
deterministic algorithm is at least f^n 1 / 3 ) and that a randomized non-clairvoyant algorithm achieves a 
competitive ratio of f2(logn). Remarking that lower bounds on competitive ratio relied on overloading the 
system, [14] proposes to compare the algorithm to an optimum solution with restricted resources. This 
analysis technique, known as resource augmentation, allows [9J to show that given (1 + e) more processing 
power, a simple deterministic algorithm achieves a constant competitive ratio. Concerning makespan 
minimization in this setting, earlier work by [8] already conformed to these restrictions and show that 
the competitive ratio of non-clairvoyant list scheduling is essentially 2 which is optimal; [5] proposes as 
well an optimal algorithm when there exists precedence constraints, with competitive ratio 2.6180. Ex- 
tensive experimental studies (e.g., [HI [2]) have been conducted on various scheduling heuristics. It turns 
out that real jobs are not fully parallelizable and thus the models above are not adequate in practice. To 
refine the model, [HE] introduce a very general setting for non-clairvoyance in which the jobs go through 
a sequence of different phases. Each phase consists of a certain quantity of work with a speed-up function 
that models how it takes advantage of the number of processors it receives. For example, during a fully 
parallel phase, the speed-up function increases linearly with the number of processors received. They 
prove that even if the scheduler is unaware of the characteristics of each phase, some policies achieve 
constant factor approximation of the optimal flowtime. More precisely, in [4], the authors show that 
the Equi policy, introduced in the 1980's by [18J and implemented in a lot of real systems, achieves a 
competitive ratio of (2 + v3) for flowtime minimization when all the jobs arrive at time 0. |3J shows that 
in this setting no non-clairvoyant scheduler can achieve a competitive ratio better than £l(y/n) when jobs 
arrive at arbitrary time and shows that Equi achieves a constant factor approximation of the optimal 
flowtime if it receives slightly more than twice as much resources as the optimal clairvoyant schedule it is 
compared to. We refer the reader to the survey [1] for a current state of the field. It turns out that in real 
life systems, the characteristics of the jobs (such as release time, processing time,...) are usually unknown 
and/or unpredictable beforehand. In particular, the system is typically unaware of the remaining work 
in each job or of the ability of the job to take advantage of more resources. 

In this paper, we adopt the job model of [4JG] an d consider the metrics setflowtime that was introduced 
by [33] in the context of data broadcast scheduling with dependencies. We consider the case where a 
collection of sets of jobs arrive at time 0. The goal is to minimize the sum of the completion time of the 
sets, where a set is completed when all of its jobs are done. If the input consists of a single set of jobs, 
the setflowtime is simply the makespan of the jobs; and if the input consists of a collection of singleton 
sets, the setflowtime is simply the flowtime of the jobs. The setflowtime covers thus a continuous range of 
objective functions from makespan to flowtime. This metrics introduces a minimal form of dependencies 
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between jobs of a given set. In the special case where jobs consist of a single sequential phase followed 
by a fully parallel phase (with arbitrary release dates), [15] shows that the competitive ratio of the non- 
clairvoyant strategy EquioA, that splits evenly the processors among the uncompleted set of jobs and 
schedules the uncompleted jobs of the set within these processors according to some algorithm A, is O(l) 
with constant resource augmentation. 

As in [4], we focus in this article on the case where all the sets of jobs are released at time 0, a typical 
situation of a high performance cluster that receives all the jobs from different members of an institution 
at the time the institution is granted the access to the cluster. We show that the non-clairvoyant strategy 
EquioEqui that evenly splits the available processors among the still unserved clients and then evenly 
splits these processors among the still uncompleted jobs of each unserved client, achieves a competitive 
ratio (2 + s/S + o(l)) J"" for the setflowtime minimization and that it is asymptotically optimal (up 
to a constant factor), where n is the size of the largest set (Theorem [2]). In the special case of a single 
set, we show that the non-clairvoyant strategy Equi achieves a competitive ratio of (1 + o(l)) J?" for 
the makespan minimization problem, which is again asymptotically optimal (up to a constant factor) 
(Theorem [TJ) . This result shows that as opposed to what previous studies on malleable jobs may let 
believe, the assertion "Equi never starves a job" is at the same time true and false: false, because we 
show that it can delay some jobs up to a factor A ™ , and true, because we show that no algorithm 
(deterministic or randomized) can achieve a better stretch than 41 ^ n . 

As a byproduct of our analysis, we extend the reduction shown by Edmonds in (3j Lemma 1]. We 
show that in order to analyze the competitiveness of a non-clairvoyant scheduler in the general job phase 
model, one only needs to consider jobs consisting of sequential or parallel work whatever the objective 
function is (flowtime, makespan, setflowtime, stretch, energy consumption,...) (Proposition 0)). This last 
result demonstrates that these two regimes are of the highest interest for the analysis of non-clairvoyant 
schedulers since they are much easier to handle and allows to treat the very wide range of non-decreasing 
sublinear speed-up functions all at once. 

The next section introduces the model and the notations. Section [3] extends the reduction to jobs 
with sequential or parallel phases, originally proved by [3]. Section 0] shows that Equi achieves an 
asymptotically optimal competitive ratio for non-clairvoyant makespan minimization, and introduces the 
tools that will be used in the last section to obtain the competitiveness of EquioEqui for non-clairvoyant 
setflowtime minimization. 



2 Non-clairvoyant Batch Sets Scheduling 

The problem. We consider a collection S = {Si, . . . , S m } of sets Si = { Jj,i, . . . , Ji, ni } of rtj jobs, each 
of them arriving at time zero. A schedule S p on p processors is a set of piecewise constant functions^] 
Pij : t i — > p\- where p*- is the amount of processors allotted to job Jij at time t; {p\j) are arbitrary non- 
negative real numbers, such that at any time: YlijPij ^ P- Following the definition introduced by [4], 
each job Jij goes through a series of phases J}^ . . . , J^ 3 with different degree of parallelism; the amount 
of work in each phase j|- is wfj] at time t, during its fc-th phase, job Jij progresses at a rate given by a 
speed-up function of the amount pjj of processors allotted to Jy, that is to say that the amount 

of work accomplished between t and t + dt during phase j£ is T^-{p\-)dt. Let & denote the completion 
time of the fc-th phase of Jij, i.e. tfj is the first time t' such that f*k-i rjjL-(p|.-) dt = wfj (with t?- = 0). 
Job Jij is completed at time Cij = tf? . A schedule is valid if all jobs eventually complete, i.e., Cij < oo 

1 Requiring the functions (pij) to be piecewise constant is not restrictive since any finite set of reasonable (i.e., Rie- 
mann integrable) functions can be uniformly approximated from below within an arbitrary precision by piecewise constant 
functions. In particular, all of our results hold if pij are piecewise continuous functions. 
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for all i,j. Set Si is completed at time q = max J= i.. ni Cjj. The flowtime of the jobs in a schedule S p is: 
Flowtime(Sp) = J2ij c *j- The makespan of the jobs in S p is: Makespan(S p ) = maxjjCy. The setflowtime 
of the sets in S p is: Setflowtime(S p ) = YliLi c «- Note that: if the input collection S consists of a single 
set Si, the setflowtime of a schedule S p is simply the makespan for the jobs in Si; and if S is a collection 
of singleton sets Si = {Jn}, the setflowtime of S p is simply the flowtime of the jobs. The setflowtime 
allows then to measure a continuous range of objective functions from makespan to flowtime. Our goal is 
to minimize the setflowtime of a collection of sets of jobs arriving at time 0. 

We denote by OPT p (S') (or simply OPT p or OPT if the context is clear) the optimal setflowtime of 
a valid schedule on p processors for collection S: OPT p = inf a n schedules S p Setflowtime(S p ). 

Speed-up functions. We make the following reasonable assumptions on the speed-up functions. In 

the following, we consider that each speed-up function is non- decreasing and sub-linear (i.e., such that 

r fe (p) r fc (p') 

for all i,j,k, p < p =¥ - 13 - — ^ — ). These assumptions are usually verified (at least desirable...) in 
practice: non-decreasing means that giving more processors cannot deteriorate the performances; sub- 
linear means that a job make a better use of fewer processors: this is typically true when parallelism 
does not take too much advantage of local caches. As shown in [3], two types of speed-up functions will 
be of particular interest here: the sequential phase where T(p) = 1 for all p ^ (the job progresses at 
constant speed even if no processor is allotted to it, similarly to an idle period); and the fully parallel 
phase where T(p) = p for all p ^ 0. Two classes of instances will be useful in the following. We denote by 
(Par-Seq)* the class of all instances in which each phase of each job is either sequential or fully parallel, 
and by Par-Seq the class of all instances in which each job consists of a fully parallel phase followed by 
a sequential phase. Given a (Par-Seq)* job J, we denote by par(J) (resp., seq(J)) the sum of the fully 
parallel (resp., sequential) works over all the phases of J. Given a set Si = {Jj,i, • • • , Ji, ni } of (Par-Seq)* 
jobs, we denote by par(5 , i ) = ^=1 P ar (<A?) and sec \( s i) = m axj=i v .. jni seq(J lt? ), 

Non-clairvoyant scheduling. In a real life system, the scheduler is typically not aware of the speedup 
functions of the jobs, neither of the amount of work that remains for each job. Following the definition 
in [HE], we consider the non-clairvoyant setting of the problem. In this setting, the scheduler knows 
nothing about the progress of each job and is only informed that a job is completed at the time of its 
completion. In particular, it is not aware of the different phases that the job goes through (neither of the 
amount of work nor of the speed-up function). It follows that even if all the job sets arrive at time 0, the 
scheduler has to design an online strategy to adapt its allocation on-the-fly to the overall progress of the 
jobs. We say that a given scheduler A p is c-competitive if it computes a schedule A p (S) whose setflowtime 
is at most c times the optimal clairvoyant setflowtime (that is aware of the characteristics of the phases of 
each job), i.e., such that Setflowtime(^4 p (S)) ^ c • OPT p (S') for all instances S. Due to the overwhelming 
advantage granted to the optimum which knows all the hidden characteristics of the jobs, it is sometimes 
necessary for obtaining relevant informations on an non-clairvoyant algorithm to limit the power of the 
optimum by reducing its resources. We say that a scheduler A p is s-speed c-competitive if it computes 
a schedule A sp (S) on sp processors whose setflowtime is at most c times the optimal setflowtime on p 
processors only, i.e., such that Setflowtime(A sp (5)) ^ c • OPT p (5) for all instances S. 

We analyse two non-clairvoyant schedulers, namely Equi and Equi o Equi, and show that they have 
an optimal competitive ratio up to constant multiplicative factors. The following two theorems are our 
main results and are proved in Propositions El [8] and IT2l 

Theorem 1 (Makespan minimization) Equi is a - inijj^ competitive non- clairvoyant algorithm 
for the makespan minimization of a set of n jobs arriving at time t = 0. Furthermore, no non-clairvoyant 
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deterministic (resp. randomized) algorithm is s-speed c-competitive for any s = o( i n ?" ) and c < l " " 
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Theorem 2 (Main result) EquioEqui is a - in \nn^ competitive non-clairvoyant algorithm for 

the setflowtime minimization of a collection of sets of jobs arriving at time t = 0, where n is the maximum 
cardinality of the sets. ( Clearly the lower bound on competitive ratio given above holds as well for this 
problem). 

3 Non-clairvoyant scheduling reduces to (Par-Seq)* instances 

In [3], Edmonds shows that for the flowtime objective function, one can reduce the analysis of the compet- 
itiveness of non-clairvoyants algorithm to the instances composed of a sequence of infinitesimal sequential 
or parallel work. It turns out that as shown in Proposition [4] below, his reduction is far more general and 
applies to any reasonable objective function (including makespan, setflowtime, stretch, energy consump- 
tion,...), and furthermore reduces the analysis to instances where jobs are composed of a finite sequence 
of positive sequential or fully parallel work, i.e., to true (Par-Seq)* instances. 

Consider a collection!! of n jobs Ji , . . . , J n where Ji consists of a sequence of phases J\, . . . , Jf % of 
work w\, . . . ,wf with speed-up functions Tj, . . . Consider a speed s > 0. Let A sp be a arbitrary 

non-clairvoyant scheduler on sp processors, and O p a valid schedule of Ji, . . . , J n on p processors. 

Lemma 3 (Reduction to (Par-Seq)* instances) There exists a collection of (Par-Seq)* jobs J[, . . . , J' n 
such that O p [J'/J] is a valid schedule of J[,...,J' n and A sp (J') = A sp {J)[J' / J\, where S[J'/J] denotes 
the schedule obtained by scheduling job J[ instead of Ji in a schedule 8. 

Proof. The present proof only simplifies the proof originally given in [3] in the following ways: the jobs 
J{, . . . , J' n consist of a finite number of phases (and are thus a valid finitely described instance), and the 
schedules computed by algorithm A sp on instances J[,...,J^ and J±, . . . , J n are identical, which avoids 
to consider infinitely many schedules to construct J' from J. 

Consider the two schedules A sp (J) and O p . Consider job J\ (the construction of J[ is identical for 
Ji, i ^ 2). Let pA{t) and po{t) be the number of processors allotted overtime to J\ by A sp (J) and O p 
respectively. Let ip{t) be the time t! at which the portion of work of J\ executed in O p at time t, is 
executed in A sp (J). Let r t / be the speed-up function of the portion of work of J\ executed in A sp (J) at 
time t' . By construction, for all i, the same portion of work dw of J\ is executed between t and t + dt 
in O p and between ip{t) and ip{t + dt) = ip(t) + dtp(t) in A sp (J) with the same speed-up function r^(t), 

thus: dw = T v( t)(p (t))dt = T^ t) (p A (ip(t))) d<p(t); it follows that 92's derivative is (p'(t) = r^)(p^(<i(*))) 
(^ 0, ip is an increasing function). pA(f(t)) and po(t) are (by definition) piecewise constant functions. 
Let t\ = < ti < • ■ ■ < such that pA(<p(t)) and po(t) are constant on each time interval [tk,tk+i) and 
zero beyond tf, let t' k = ip(tk), pA(f) is constant on each time interval (t' k , t' k+1 ); let p\ = PA{t' k ) and 
Pq = po(tk)- By construction, the portion of work of J\ executed by A sp (J) between times t' k and f k+1 , 
is executed by O p between times tf- and tk+i- J[ consists of a sequence of (£ — 1) phases, sequential or 
fully parallel depending on the relative amount of processors p@ and p\ alloted by O p and A sp (J) to J\ 
during time intervals [tfc, ife+i] and respectively. The fc-th phase of J[ is defined as follows: 



If p^ ^ p\, the k-th phase of J[ is a sequential work of w k = t k+1 — t' k . 



2 Note that the reduction to (Par-Seq)* instances applies as well to jobs with release dates, precedences constraints, or 
any other type of constraints, since Lemma [3] simply consists in remapping the phases of the jobs within two valid schedules 
that naturally satisfy these additional constraints. 
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• If Pq > p A , the k-th phase of J[ is a fully parallel work of = p k A - {t' k+1 — t' k ). 

The k-th phase of J{ is designed to fit exactly in the overall amount of processors allotted by A sp to J\ 
during [i^., ifc+ib thus, since A sp is non-clairvoyant, A sp (J') = A sp (J)[J' j J]. Let now verify that the fc-th 
phase of J[ fits in the overall amount of processors allotted by P to J\ during [ifc, *fc+i] - 

• If p% ^ p^, w k = dt' = (p'(t)dt = / yu " rft / dt = t k+ i - t k since the 

Jt'^ Jt k Jt k l <p(t){PA> Jtk 

^ip(t) are non-decreasing functions. 

. If p k > p k A , w k = p\ f k+1 dt' = p\ f k+1 T * {t) \ P \\ dt ^ p\ f k+1 4* = Po ■ (tk+i - t k ), since the 

Jt' h Jt k l ^(t)\PA) Jtk Pa 

^(p{t) are sub-linear functions. 

It follows that in both cases, the fc-th phase of J{ can be completed in the space allotted to J\ in O p 
during [t k ,t k+1 ]. □ 

Consider an arbitrary non-clairvoyant scheduling problem where the goal is to minimize an objective 
function F over the set of all valid schedules of an instance of jobs J\ , . . . , J n . Assume that F is monotonic 
in the sense that F(S) ^ F(§') if S and §' are two valid schedules of J%, . . . , J n such that for all i, Ji receives 
at any time less processors in S than in S' (note that since a completed job do not receive processors, this 
implies that for all i, Ji cannot complete in S' before it completes in S). Note that all standard objective 
functions are monotonic: flowtime, makespan, setflowtime, stretch, energy consumption, etc. Then, 

Proposition 4 Any non-clairvoyant algorithm A F for a monotonic objective function F that is s-speed c- 
competitive over (Par-Seq)* instances, is also s-speed c-competitive over all instances of jobs going through 
phases with arbitrary non- decreasing sublinear speed-up functions. 

Proof. Consider a non-(Par-Seq)* instance J = {J±, . . . , J n }. Denote by OPT^(J) the optimal cost for 
J, i.e., OPT^(J) = inf{F(S) : S is a valid schedule of J on p processors}. Consider an arbitrary small 
e > and a valid schedule of J such that F(0) ^ OPT^(J) + e (note that we do not need that 
an optimal schedule exists). Let J' be the (Par-Seq)* instance given by Lemma [3] from J, Af p , and 0. 
Since Af p (J') = Af p (J)[J' / J], F(Af p (J)) = F(Af p (J')). But Af p is s-speed c-competitive for J', so: 
F(Af p (J)) < c • OPT^ (J') ^ c • F(Q[J'/J]) < c • F(0) ^ cOPT^ (J) + ce, as G[J'/J] is a valid schedule 
of J' and F is monotonic. Decreasing e to zero completes the proof. □ 

It follows that for any non-clairvoyant scheduling problem, it is enough to analyse the competitiveness 
of a non-clairvoyant algorithm on (Par-Seq)* instances. Sequential and parallel phases are both unrealistic 
(sequential phases that progress at a constant rate even if they receive no processors are not less legitimate 
than fully parallel phases which do not exist for real either). Nevertheless, these are much easier to 
handle in competitive analysis, and Proposition H] guarantees that these two extreme(ly simple) regimes 
are sufficiently general to cover the range of all possible non-decreasing sublinear functions. We shall from 
now on consider only (Par-Seq)* instances. 

4 The single set case 

In this section, we focus on the case where the collection S consists of a unique set S\ = { J\, . . . , J n }. 
The problem consists thus in minimizing the makespan of the set of jobs Si. This problem is interesting 
on its own and, as far as we know, no competitive non-clairvoyant algorithm was known. Furthermore, 
the analysis that follows is one of the keys to the main result of the next section. 
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4.1 EQUI Algorithm 



Equi is the classic operating system approach to non-clairvoyant scheduling. It consists in giving a 
equal amount of processors to each uncompleted job (operating systems approximate this strategy by a 
preemptive round robin policy). Formally, given p processors, if N(t) denotes the number of uncompleted 
jobs at time t, Equi allots p\ = p/N(t) processors to each uncompleted job Jj at time t. 

In [U Theorem 3.1], the authors show that Equi is (2 + \/3)-competitive for the flowtime of the 
jobs when all the jobs arrive at time t = 0. As pointed out in [3], the key of the analysis is that the 
contribution to the flowtime of the sequential phases is independent of the scheduling policy, and thus the 
performance of the scheduler is measured by its ability to give a sufficiently large amount of processors 
to the parallel phases. When parallel work is delayed by sequential work with respect to the optimum 
strategy, the number of uncompleted jobs in a parallel phase increases and Equi allots more and more 
processing power to parallel work. It follows that Equi self-adjusts naturally which yields that it has a 
constant competitive ratio for flowtime minimization. 

When the objective is to minimize the makespan, the times at which the sequential phases are sched- 
uled matter because they can be arbitrarily delayed by parallel phases as shown in the following example. 

Example 1 Consider n = £ l jobs arriving at time on one processor. Between time t = and t = 1, a 
fraction 1 — l/£ of the £ jobs are in a sequential phase of work 1 and all of them complete at time 1; the 
other 1/t fraction of the jobs is in a parallel phase of work 1/£ E each; Equi allots to each job an equal 
processing power l/£ during this time interval and at time 1 only remains the I /£ = £ jobs that just 
finish their first parallel phase. We continue recursively as follows until time t = £ as illustrated on Fig. [T] 
at integer time t = % < £, I jobs are still uncompleted; between time t = i and t = i + 1, a fraction 
1- l/£ of the £*-* jobs are in a sequential phase of work 1 and all of them complete at time i + 1; the 
other l/£ fraction of the jobs is in a parallel phase of work \/£ l ~ l each; Equi allots to each job an equal 
processing power l/£ e ~ l during this time interval and at time i+ 1 only remains the £ e ~ i /£ = ^-(i+i) j obs 
that just finish their i-th parallel phase. At time t = £, there only remains one job which completes at 
time £ + 1 after a sequential phase of work 1. 



t=o 



t=i 



t=2 



t=3 



A fraction 1 -Mi of the Jobs are 
in a sequential phase 
and completes afterwards 



A fraction Ml of the Jobs are 
in a parallel phase 



njobs n/^jobs n/Y 2 jobs 
are alive are alive are alive 



t=^+l 



I jobs 
are alive 



1 job 
is alive 



— ■ represents a sequential phase 
] | represents a parallel phase 



Figure 1: An inefficient execution of Equi. 



It follows that for this instance, Equi achieves a makespan of £+ 1. But, the amount of parallel work 
executed within each time interval [i,i + 1] for i = 0, ...,£ — 1, equals to l/£. It follows that an optimal 
(clairvoyant) scheduler can complete all the parallel work in one time unit and then finish the remaining 
sequential work before time 2. Since n = ft and £ > A " , we conclude: 

Fact 5 Equi is not c- competitive for the makespan minimization problem, for any c ^ jTnftn- 
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It follows that as opposed to the flowtime minimization, we need to take into account the delay 
introduced by parallel phases over sequential phases. (Note that for the instance above, the flowtime 

achieved by Equi is 1 ~ 1 ]/y e = 1 + l/£ + o(l/£) which is asymptotically optimal.) 
4.2 Analysis of Equi for makespan minimization 

Thanks to Proposition [H we focus on a (Par-Seq)* instance S = {Ji, . . . , J m }. By rescaling the parallel 
work in each job, we can assume w.l.o.g. that p = 1. We show that the behavior exhibited in the 
example of section [4~T1 is indeed the worst case behavior of Equi. Let us define the Par-Seq instance 
S' = { J[, J'n} where each J[ consists of a fully parallel phase of work par(Jj) followed by a sequential 
phase of work seq(Jj). Observe that: 

Lemma 6 Makespan(EQUi(5)) < Makespan(EQUi(5")). 

Proof. Since all the jobs arrive at time 0, the number of uncompleted jobs is a non-increasing function 
of time. It follows that the amount of processors alloted by Equi to a given job is a non-decreasing 
function of time. Thus, moving all the parallel work to the front, can only delay the completion of the 
jobs since less processors will then be allocated to each given piece of parallel work. □ 

Proposition 7 Equi is (1 + o(l)) ^^--competitive for the makespan minimization problem. 

Proof. Consider the schedule EQUl(S') and let T = Makespan(EQUi(S"))- We write [0,T] as the 
disjoint union of two sets A and A. Set a = ^"If " • R eca U that N(t) is the number of uncompleted jobs 
at time t. Let St be the number of uncompleted jobs in a sequential phase at time t. Set A is the set of all 
the instants where the fraction of jobs in a sequential phase is larger than a, and A is its complementary 
set: i.e., A _= {0 ^ t T : s t ^ (1 - a)N(t)} and A = {0 t T : s t < (1 - a)JV(t)}. Clearly, 
T = \A\ + \ A\, with \X\ = J x alt. We now bound \A\ and \A\ independently. 

At any time t in A, the total amount of parallel work completed between t and t + dt is at least a dt. 
Since the total amount of parallel work is par(S'), we get f^adt ^ par(S"). Thus, |^4| ^ par(5')/a. 

Now, let t\ < ■ ■ ■ < t q with t& G A for all k, such that the time intervals I\ = [ti, t\ +seq(S")), . . . ,I q = 
[tg,t q + seq(S")) form a collection of non-overlapping intervals of length seq(5") that covers A. Once the 
sequential phase of a Par-Seq job has begun at or before time t, the job completes before time i + seq(S"). 
Since at time tk, at least (1 — a) • N(tk) jobs are in a sequential phase, at time tk+i ^ tk + seq(S"), we 
have thus: N(t k+1 ) < aN(t k ). It follows that N(t k ) < a k ■ n. Since N(t q ) ^ 1, q ^ ln ' ( ^ a) . But A is 

covered by q time intervals of length seq(S"), so: \A\ ^ i n (i" a ) seq(5 l/ ). Finally, 

Makespan(EQUi(S)) < Makespan(EQUi(5")) = T 

^P^') + RI^seq(S') 
^( 1 + °( 1 ))hn^ max (P ar (5 / ),seq(50) 
= ( 1 +°( 1 ))hn^ max (P ar (5),seq(5)) 
^(l + o(l))^OPT(S). 

□ 



7 



4.3 Equi is asymptotically optimal up to a factor 2 

The following lemma generalizes the example given in section 14.11 and shows that Equi is asymptotically 
optimal in the worst case. Note that increasing the number of processors by a factor s does not improve 
the competitive ratio of any deterministic or randomized algorithm as long as s = o( l '° ), i.e., the 
competitive ratio does not improve even if the number of processors increases (not too fast) with the 
number of jobs. 

Proposition 8 (Lower bound on the competitive ratio of any non-clairvoyant algorithm) 

No non- clairvoyant algorithm A has a competitive ratio less than 7d — 2 hiln n ^ ^ ^ deterministic, and 
= 4 in inn */ A is randomized. 

Furthermore, no non-clairvoyant algorithm A is s-speed c-competitive for any speed s = o( ln ^ n ) if 
c < 7d and A is deterministic, or c < 7^ if A is randomized. 

Proof. We first extend Example Q] to cover all deterministic algorithms. Consider the execution of an 
algorithm A s given s processors on the following instance. At time 0, n = (s£) jobs are given. Since the 
algorithm is non-clairvoyant, we set the phase afterwards. At time 1, we renumber the jobs J\, . . . , J n 
by non-decreasing processing power received between t = and t = 1 in A s . Between time t = and 
t = 1, we set the jobs Jr s nt-i+i, ■ ■ ■ ,Jn (i-e., the last fraction 1 — l/(s£) of the (s£) jobs) to be in a 
sequential phase of work 1 and say that all of them complete at time 1; each Jj of the J±, . . . , J( si y-i are 

set in a parallel phase of work p l - dt each between time and 1, where p*- is the amount of processors 
alloted to Jj at time t. The processing power received by the last 1 — I /{si) fraction of jobs between 
t = and t = 1 is at least s — l/t and thus, the total parallel work assigned to the jobs between and 1 
is at most l/£. At time 1 only remains the jobs J±, . . . , Jugy-i that just have finished their first parallel 
phase. We continue recursively as follows until time t = £; at integer time t = i < £, (s£) e ~ l jobs are still 
uncompleted; between time t = i and t = i + 1, the fraction 1 — l/(s£) of the {sty~~ % jobs that received 
the most processing power are set in a sequential phase of work 1 and all of them complete at time i + 1; 
each job Jj of the other l/(s£) fraction is set in a parallel phase of work f* +1 p* dt each; At time i + 1 only 
remains the (s£) e ~ l / (s£) = (s^/~(* +1 ) jobs that just have finished their i-th parallel phase. At time t = £, 
there only remains one job which completes at time £ + 1 after a sequential phase of work 1. It follows 
that for this instance, A s achieves a makespan of £ + 1. But, the amount of parallel work executed within 
each time interval + 1] for i = 0,...,£ — 1, is at most l/£. It follows that an optimal (clairvoyant) 
scheduler on 1 processor can complete all the parallel work in one time unit and then finish the remaining 
sequential work before time 2. But n = (s£Y , £ > ln ° " , which concludes the proof. 

We use the Yao's principle (see [HI H3]) to extend the result to randomized algorithms. Due to space 
constraint, we just sketch the proof. Take an arbitrary deterministic scheduler A, we will show that A 
achieves expected makespan of at least 41 1 ^ 1 " n on the random instance obtained by: 1) making n copies 
of each job in the instance of Example [U 2) dividing the parallel work of each job by n; and 3) taking a 
random permutation of the n 2 resulting jobs. Take e > 0, at time 1, at most jobs have received at least 
Since A is non-clairvoyant and since the jobs are randomly permuted, the expected number of jobs 
starting with a parallel phase in total) that have received between time and 1 at most processors 

is at least jp^g - Since the hypergeometric distribution (the distribution given by a permutation, see [6]) 
is more concentrated than the binomial, the Chernoff bound tells that the complementary probability 

2 

that at most 2 (i+e)l J ^ s ^id not complete their parallel phase between time and 1 is exponentially 

2 

small. Reasoning recursively up to time £, conditionnally to the fact that at least J°^ s are s ^ n 

alive at time i, we conclude that with constant probability a job will survive up to time I 4/^^ ■ ^ 
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5 Non- Clairvoyant Batch Set Scheduling 



We now go back to the general problem. Consider a collection S = {Si,...,S m } of m sets Si = 
{Ji,i, ■ ■ ■ , Ji,m) of rii (Par-Seq)* jobs, each of them arriving at time zero. The goal is to minimize the 
setfiowtime of the sets. 

5.1 EQUloEQUI Algorithm 

In the context of the data broadcast with dependencies and for the purpose of proving the competitiveness 
of their broadcast scheduler, the authors of [15J develop a strategy, namely Equio^4, for Seq-Par instances 
(i.e., where each job consists of a sequential phase followed by a fully parallel phase). The Equio^4 
strategy consists in allotting an equal amount p of processors to each uncompleted set of jobs and to split 
arbitrarily (according to some algorithm A) this amount p of processors among the uncompleted jobs 
within each set. This strategy is shown to be 0(l)-speed 0(l)-competitive independently of the choice of 
algorithm A, as long as A does not leave some processors unoccupied. It turns out that if the instance is 
not Seq-Par, the choice of A matters to obtain competitiveness. Consider for instance a set of n Par-Seq 
jobs consisting of a parallel work e followed by a sequential work 1 arriving at time on one processor; if 
A schedules the jobs one after the other within the set, the makespan will be (1 + e)n whereas the optimal 
makespan is ne + 1. 

We thus consider the EquioEqui strategy which splits evenly the amount of processors given to each 
set among the uncompleted jobs within that set. Formally, let N(t) be the number of uncompleted sets 
at time t, and Ni(t) the number of uncompleted jobs in each uncompleted set Si at time t. At time t, 
EquioEqui on p processors allots to each uncompleted job Jy an amount of processors p\j = N ( t y N .( t } ■ 
Note that in the example above, the makespan of EquioEqui is optimal, 1 + ne. The following section 
shows that indeed the competitive ratio of this strategy is asymptotically optimal (up to a constant 
multiplicative factor). 

5.2 Competitiveness of EquioEqui 

Scaling by a factor p each sequential work, again we assume w.l.o.g. that p = 1. Consider the Par-Seq 
instance S' = {S[, . . . , S' m } where S[ = {J[ 1 , . . . ,J' in _] and each job J[, consists of a fully parallel phase 
of work par(Jjj) followed by a sequential phase of work seq(Jy). Following the proof of LemmaEl we get: 

Lemma 9 Setflowtime(EQUl o Equi(S)) < Setflowtime(EQUl o EQUI (£'))• 

The next lemmas are the keys to the result. They reduce the analysis of EquioEqui to the analysis 
of the flowtime of EQUI for a collection of jobs, which is known from [4] to be (2 + \/3)-competitive when 
all the jobs arrive at time 0. Let n = max, = i v .. im rii be the maximum size of a set Si, and let a = ■ 

Lemma 10 There exists a (Par-Seq)* instance J = {Ji, . . . , J m } of Non- Clairvoyant Batch Job Schedul- 
ing, such that: Equi(J) = Equi o Equi(5')[J/S"], par(Jj) < ± par(S^), and seq(Jj) < ln l ™ a) seq(^), 
where $[J/S] denotes the schedule where Ji receives at any time the total amount of processors alloted to 
the jobs J-j of S^ in schedule S. 

Proof. Let £ = Equi o Equi(S"). Let us construct J\ (the construction of Ji, i ^ 2, is identical). 
Consider the jobs J[ l3 . . . , J[ m of S[ in the schedule £. Let t\ = < • • • < t q = d-y (where d x denotes 
the completion time of S[ in £), such that during each time interval [tk,tk+i), each job J[j remains in 
the same phase; during [tfc,tfc+i), the number of jobs of S[ in a sequential (resp. fully parallel) phase is 
constant, say Sk (resp. iVi(i&) — Sfc). J\ has (q — 1) phases: 
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• if Sk ^ (1 — a)A r 1 (t/ c ), the fc-th phase of J\ is sequential of work Wk = tfc+i — tk- 

• if Sfc < (1 — a)N\(tk), the fc-th phase of J\ is fully parallel of work wt = Jt£ +1 jms^t. 

J\ is designed to fit exactly in the space alloted to S[ in £, thus Equi(J) = £[J/S"]. We now have 
to bound the total parallel and total sequential works in J\. Let K = {k : sj. ^ (1 — a)Ni(tk)} and 
K = {1, . . . , q — 1} n K ; by construction, seq(Ji) = J2keK w k an d P ar (^i) = YlkeK w k- F° r eacn 
* £ [ijfejijfc+i) with k E K, the amount of parallel work of jobs in S[ between t and t + dt is at least 
jmj^m dt = j^jj dt. It follows that the amount of parallel work of jobs in S[ scheduled in £ during 
[tk,tk+i) is at least a f t * +1 jm\dt = awk- Thus, par(S[) ^ YlkeR aw k = apar(Ji), which is the claimed 
bound. Now, let A = Ufc e x[i&, *fe+i)i we have \A\ = seq(Ji). Since the bound on the size of A in proof 
of Proposition [7] relies on a counting argument (and is thus independent of the amount of processors 
given to the set) and the jobs in S[ are Par-Seq, the same argument applies and \A\ ^ in(i/a) ^(^l) ^ 
ln(i/a) ^(^Oi which conclude the proof. □ 

Let J' = {«/{, . . . , J^} be the Par-Seq instance of Batch Job Scheduling where each job J- consists of a 
fully parallel work of par(Jj) followed by a sequential work of seq(Jj). Again, as the amount of processors 
alloted by Equi to each job is a non-decreasing function of time, pushing parallel work upfront can only 
make it worse, thus: 

Lemma 11 Equi(J) < Equi(J')- 

We can now conclude on the competitiveness of Equi o Equi. 

Proposition 12 EquioEqui IS - In \nn^ COUipctitlVC foT ttl6 SetflowtlTfie TTliTliUllZCLtlOTl problcUl. 

Proof. Putting everything together with the analysis of Equi in [3]: 

Setflowtime(EQUi o Equi(5)) < Setflowtime(EQUi o Equi(S' / )) (Lemma [9]) 



= Flowtime(EQUi(J)) (Lemma [TO]) 

^ Flowtime(EQUi(J')) (Lemma [TT]) 

< (2 + v / 3)OPT(J / ). ([I Theorem 3.1]) 

Since J' is Par-Seq, one can schedule first all the parallel work in J' followed by all the sequential phases 
together. The flowtime of the resulting schedule is par(J') + seq(J'), thus OPT(J') ^ par(J') + seq(J'). 
Finally, 

Setflowtime(EQUi o Equi(5)) < (2 + v / 3)(par(J / ) + seq(J')) 

< (2 + >/3)(i par(S') + seq(S')) (Lemma^ 

< (2 + v/3)(± + 4^) • max(par(5), seq(5)) 

< (2 + ^3 + o(l))fe -OPT(S). 

□ 
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